步态识别是指根据人的身体形状和步行方式对人的识别或识别,这些视频数据是从远处捕获的视频数据中得出的,被广泛用于预防犯罪,法医身份和社会保障中。但是,据我们所知,大多数现有方法都使用外观,姿势和时间feautures,而无需考虑用于全球和局部信息融合的学习时间关注机制。在本文中,我们提出了一个新型的步态识别框架,称为“时间关注”和“关键”引导的嵌入(Gaittake),该框架有效地融合了基于时间注意的全球和局部外观特征以及时间聚集的人类姿势特征。实验结果表明,我们所提出的方法在步态识别中获得了新的SOTA,排名1的准确性为98.0%(正常),97.5%(袋)和92.2%(涂层)(涂层)在CASIA-B GAIT数据集中;OU-MVLP步态数据集的精度为90.4%。
translated by 谷歌翻译
尽管在自动语音识别(ASR)中最近的表现方法增加了,但这种方法并不能确保其输出的适当套管和标点符号。这个问题对自然语言处理(NLP)算法和人类的理解都有重大影响。对于原始文本输入的预处理管道,必须进行资本化和标点符号恢复。对于越南人等低资源语言,此任务的公共数据集很少。在本文中,我们为越南人的资本化和标点符号恢复贡献了一个公共数据集;并提出了两个名为intercappunc的任务的联合模型。越南数据集的实验结果显示了我们联合模型的有效性与单个模型和先前的联合学习模型相比。我们在https://github.com/anhtunguyen98/jointcappund上公开发布数据集和模型的实现
translated by 谷歌翻译
We introduce an approach for the answer-aware question generation problem. Instead of only relying on the capability of strong pre-trained language models, we observe that the information of answers and questions can be found in some relevant sentences in the context. Based on that, we design a model which includes two modules: a selector and a generator. The selector forces the model to more focus on relevant sentences regarding an answer to provide implicit local information. The generator generates questions by implicitly combining local information from the selector and global information from the whole context encoded by the encoder. The model is trained jointly to take advantage of latent interactions between the two modules. Experimental results on two benchmark datasets show that our model is better than strong pre-trained models for the question generation task. The code is also available (shorturl.at/lV567).
translated by 谷歌翻译
Air pollution is an emerging problem that needs to be solved especially in developed and developing countries. In Vietnam, air pollution is also a concerning issue in big cities such as Hanoi and Ho Chi Minh cities where air pollution comes mostly from vehicles such as cars and motorbikes. In order to tackle the problem, the paper focuses on developing a solution that can estimate the emitted PM2.5 pollutants by counting the number of vehicles in the traffic. We first investigated among the recent object detection models and developed our own traffic surveillance system. The observed traffic density showed a similar trend to the measured PM2.5 with a certain lagging in time, suggesting a relation between traffic density and PM2.5. We further express this relationship with a mathematical model which can estimate the PM2.5 value based on the observed traffic density. The estimated result showed a great correlation with the measured PM2.5 plots in the urban area context.
translated by 谷歌翻译
Pareto Front Learning (PFL) was recently introduced as an effective approach to obtain a mapping function from a given trade-off vector to a solution on the Pareto front, which solves the multi-objective optimization (MOO) problem. Due to the inherent trade-off between conflicting objectives, PFL offers a flexible approach in many scenarios in which the decision makers can not specify the preference of one Pareto solution over another, and must switch between them depending on the situation. However, existing PFL methods ignore the relationship between the solutions during the optimization process, which hinders the quality of the obtained front. To overcome this issue, we propose a novel PFL framework namely \ourmodel, which employs a hypernetwork to generate multiple solutions from a set of diverse trade-off preferences and enhance the quality of the Pareto front by maximizing the Hypervolume indicator defined by these solutions. The experimental results on several MOO machine learning tasks show that the proposed framework significantly outperforms the baselines in producing the trade-off Pareto front.
translated by 谷歌翻译
Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines.
translated by 谷歌翻译
The instrumental variable (IV) approach is a widely used way to estimate the causal effects of a treatment on an outcome of interest from observational data with latent confounders. A standard IV is expected to be related to the treatment variable and independent of all other variables in the system. However, it is challenging to search for a standard IV from data directly due to the strict conditions. The conditional IV (CIV) method has been proposed to allow a variable to be an instrument conditioning on a set of variables, allowing a wider choice of possible IVs and enabling broader practical applications of the IV approach. Nevertheless, there is not a data-driven method to discover a CIV and its conditioning set directly from data. To fill this gap, in this paper, we propose to learn the representations of the information of a CIV and its conditioning set from data with latent confounders for average causal effect estimation. By taking advantage of deep generative models, we develop a novel data-driven approach for simultaneously learning the representation of a CIV from measured variables and generating the representation of its conditioning set given measured variables. Extensive experiments on synthetic and real-world datasets show that our method outperforms the existing IV methods.
translated by 谷歌翻译
药物误解是可能导致对患者造成不可预测后果的风险之一。为了减轻这种风险,我们开发了一个自动系统,该系统可以正确识别移动图像中的药丸的处方。具体来说,我们定义了所谓的药丸匹配任务,该任务试图匹配处方药中药丸所拍摄的药丸的图像。然后,我们提出了PIMA,这是一种使用图神经网络(GNN)和对比度学习来解决目标问题的新方法。特别是,GNN用于学习处方中文本框之间的空间相关性,从而突出显示带有药丸名称的文本框。此外,采用对比度学习来促进药丸名称的文本表示与药丸图像的视觉表示之间的跨模式相似性的建模。我们进行了广泛的实验,并证明PIMA在我们构建的药丸和处方图像的现实数据集上优于基线模型。具体而言,与其他基线相比,PIMA的准确性从19.09%提高到46.95%。我们认为,我们的工作可以为建立新的临床应用并改善药物安全和患者护理提供新的机会。
translated by 谷歌翻译
在科学研究和现实世界应用的许多领域中,非实验数据的因果效应的无偏估计对于理解数据的基础机制以及对有效响应或干预措施的决策至关重要。从不同角度对这个具有挑战性的问题进行了大量研究。对于数据中的因果效应估计,始终做出诸如马尔可夫财产,忠诚和因果关系之类的假设。在假设下,仍然需要一组协变量或基本因果图之类的全部知识。一个实用的挑战是,在许多应用程序中,没有这样的全部知识或只有某些部分知识。近年来,研究已经出现了基于图形因果模型的搜索策略,以从数据中发现有用的知识,以进行因果效应估计,并具有一些温和的假设,并在应对实际挑战方面表现出了诺言。在这项调查中,我们回顾了方法,并关注数据驱动方法所面临的挑战。我们讨论数据驱动方法的假设,优势和局限性。我们希望这篇综述将激励更多的研究人员根据图形因果建模设计更好的数据驱动方法,以解决因果效应估计的具有挑战性的问题。
translated by 谷歌翻译
在视频压缩中,通过运动和剩余补偿从先前解码的帧重复使用像素来提高编码效率。我们在视频帧中定义了两个层次冗余的两个级别:1)一阶:像素空间中的冗余,即跨相邻帧的像素值的相似性,该框架的相似性是通过运动和残差补偿有效捕获的,2)二阶:二阶:冗余:自然视频中的平稳运动引起的运动和残留地图。尽管大多数现有的神经视频编码文献都涉及一阶冗余,但我们解决了通过预测变量在神经视频编解码器中捕获二阶冗余的问题。我们引入了通用运动和残留预测因子,这些预测因素学会从先前解码的数据中推断出来。这些预测因子是轻量级的,可以使用大多数神经视频编解码器来提高其率延伸性能。此外,虽然RGB是神经视频编码文献中的主导色彩空间,但我们引入了神经视频编解码器的一般修改,以包含YUV420 Colorspace并报告YUV420的结果。我们的实验表明,使用众所周知的神经视频编解码器使用我们的预测因子可在UVG数据集中测得的RGB和YUV420 Colorspace中节省38%和34%的比特率。
translated by 谷歌翻译